Characterization and individual-level prediction of cognitive state in the first year after ‘mild’ stroke

Background Mild stroke affects more than half the stroke population, yet there is limited evidence characterizing cognition over time in this population, especially with predictive approaches applicable at the individual-level. We aimed to identify patterns of recovery and the best combination of demographic, clinical, and lifestyle factors predicting individual-level cognitive state at 3- and 12-months after mild stroke. Methods In this prospective cohort study, the Montreal Cognitive Assessment (MoCA) was administered at 3–7 days, 3- and 12-months post-stroke. Raw changes in MoCA and impairment rates (defined as MoCA<24 points) were compared between assessment time-points. Trajectory clusters were identified using variations of ≥1 point in MoCA scores. To further compare clusters, additional assessments administered at 3- and 12-months were included. Gamma and Quantile mixed-effects regression were used to predict individual MoCA scores over time, using baseline clinical and demographic variables. Model predictions were fitted for each stroke survivor and evaluated using model cross-validation to identify the overall best predictors of cognitive recovery. Results Participants’ (n = 119) MoCA scores improved from baseline to 3-months (p<0.001); and decreased from 3- to 12-months post-stroke (p = 0.010). Cognitive impairment rates decreased significantly from baseline to 3-months (p<0.001), but not between 3- and 12-months (p = 0.168). Nine distinct trajectory clusters were identified. Clinical characteristics between clusters at each time-point varied in cognitive outcomes but not in clinical and/or activity participation outcomes. Cognitive performance at 3- and 12-months was best predicted by younger age, higher physical activity levels, and left-hemisphere lesion side. Conclusion More than half of mild-stroke survivors are at risk of cognitive decline one year after stroke, even when preceded by a significantly improving pattern in the first 3-months of recovery. Physical activity was the only modifiable factor independently associated with cognitive recovery. Individual-level prediction methods may inform the timing and personalized application of future interventions to maximize cognitive recovery post-stroke.


Introduction
Mild stroke affects more than half the stroke population [1].Although there is currently no internationally recognized consensus on a measurement tool and cutoff score [2], it is common for clinicians and researchers to identify mild stroke patients using a score from 5-8 points on the National Institutes of Health Stroke Scale (NIHSS) [3,4].
Cognitive changes are common after mild stroke, yet often overlooked and under-diagnosed [5].Mild stroke is associated with major difficulties in common but complex activities (e.g.work, driving) that require the use of cognitive skills [6].The Montreal Cognitive Assessment (MoCA) is a valid and reliable tool used to evaluate post-stroke cognition [7].Studies using MoCA frequently describe associations between cognition and clinical characteristics (e.g.risk factors, comorbidities) or routine laboratory blood work to guide interventions [8][9][10].However, the evidence supporting those relationships is frequently limited in their interpretation.These limitations include simplification of skewed data into impaired/non-impaired groups; and/or reduction of longitudinal outcomes into cross-sectional sub-analyses (e.g.[11,12]).
Recent systematic evidence has provided an overall description of the quantitative changes in cognition after stroke in the short and long term, both in the absence and presence of interventions other than usual care [13].We have also started to unveil ways in which post-stroke disability can be predicted at the individual level in mild stroke, using machine learning algorithms [14].Despite these advances, similar machine learning approaches for cognition in mild stroke patients are still lacking.
A range of factors have potential to influence cognition and its trajectory post-stroke.These include: demographic, such as age, education and gender [15]; clinical, including initial severity of cognitive impairment, stroke severity [16]; neurological, including lesion side and volume [17]; interventions such as tissue-plasminogen activator (i.e.tPA or alteplase), anticoagulant or antiplatelet medications and their association with hemorrhagic transformation [18]; modifiable risk factors such as diet and lifestyle [19]; and blood-based biomarkers (e.g.cholesterol, vitamin B12, vitamin D, and inflammatory biomarkers) [10,20,21].In the present study we focus on three main predictor categories: demographic characteristics, clinical characteristics, and lifestyle factors, given the consistent evidence found for these variables, the available data, and the potential to identify modifiable factors in this cohort of mild stroke survivors.
restrictions on sharing a de-identified data set.The primary contact information for a data access is Prof. Leeanne Carey L.Carey@latrobe.edu.au.The best contact outside our core research team for data access queries related to the START cohort study is our hospital ethics department: ethics@austin.org.au.
To improve the characterization and precision of currently utilized predictive methods of cognition after stroke, we analyzed the raw scores of the multi-factor cognitive assessment MoCA.Our aims were: (i) to characterize cognitive trajectory pathways according to patterns of change in MoCA scores, and (ii) to find the best overall combination of available baseline variables (i.e.evaluated at 3-7 days post mild stroke) predicting future cognitive state at 3 and 12 months at the individual-patient level.

Study design
The STroke imAging pRevention and treatment (START) study was a multisite investigation including two sub-studies: the START-EXtending the time for Thrombolysis in Emergency Neurological Deficits (EXTEND) randomized placebo-controlled trial [22]; and the START-Prediction and Prevention to Achieve Optimal Recovery Endpoints after stroke (PrePARE) cohort study [23].Participants were recruited consecutively from June 10 th , 2010 until July 4 th , 2014 for the PrePARE cohort; and until June 2018 for the EXTEND trial [24].START was approved by the Australia/New Zealand (NZ) Clinical Trials registry www.anzctr.org.au(ID# ACTRN12610000987066). Central ethical approval for this study was obtained from Melbourne Health Human Research Ethics Committee (2009.079)and Austin Health Human Research Ethics Committee (H2010/03588).Consent was obtained in written form directly from the patient, family member or legally responsible other.
The final results of the EXTEND trial were published in 2019 [24].The current investigation involves analysis of cognition in both the START-EXTEND and START-PrePARE cohorts over the first year post-stroke using baseline demographic, clinical, and lifestyle factors as predictors of cognition at 3-and 12-months post-stroke.

Participants
We included participants aged �18 years, English-speaking, with a diagnosis of mild ischemic stroke (NIHSS �8 points based on previously used clinical categorization of stroke severity) [4], no prior disability (modified Rankin Scale, mRS �2 points), and able to undertake a baseline cognitive assessment 3-7 days post-stroke.

Assessment
Evaluations were carried out at baseline, 3-and 12-months post-stroke.Assessments were conducted in-person by trained, blinded evaluators, in a clinic testing room, or at the participant's home.
The primary outcome was general cognitive functioning, as evaluated by the MoCA [7].Sensitivity and specificity for this assessment are available for Australia/New Zealand stroke patients, with <24 points indicating cognitive impairment [25].

Baseline assessment
Clinical and demographic variables recorded at baseline (Table 1) and used in our predictive analysis included: age, sex, ethnicity, marital status, prior medical conditions, prior level of physical activity (Rapid Assessment of Physical Activity, RAPA) [26]; depression (Montgomery-Åsberg Depression Rating Scale, MADRS) [27]; stroke severity (National Institutes of Health Stroke Scale, NIHSS); and putative risk factors (smoking, body mass index, and blood pressure) selected based on documented associations of these variables with cognitive recovery [28,29].

Risk of bias
Assessors were blinded to study outcomes [22,23].All participants were tested at their recruiting hospital or home, by trained health professionals.Associations between MoCA and explanatory variables (predictors) at baseline were adjusted statistically to control for possible confounders/effect modifiers in the model validation process, as detailed in our statistical analysis.In addition, we conducted comparative post-hoc analyses to study potential effect modifiers from the use of medication such as antiplatelet, tPA, or anticoagulant during the course of the study.

Statistical analysis
Baseline characteristics between included and excluded participants were summarized and compared using Wilcoxon rank-sum or Fisher's exact tests, to compare continuous and categorical variables, respectively.

Characterizing cognitive recovery pathways
Changes in cognitive impairment rates between time-points were compared with the McNemar's χ 2 test [40].Differences in MoCA scores between time-points were compared using the Wilcoxon-Pratt signed-rank test [41].This analysis was conducted both for the whole sample, and subsequently, for survivors displaying the same cognitive trajectory pattern (i.e.belonging to the same trajectory cluster, as explained below).
Trajectory clusters were created using MoCA score variations of �1 point between assessments.The resulting clusters were then described and compared to look for differences in demographic and clinical profiles at each time-point.We also looked for participants with a clinically significant improvement (�2 points) in MoCA scores between assessments, as defined by previous evidence [42].

Identifying predictors of cognition
A visual guide explaining the variable selection process we completed as part of our exploratory analyses is depicted in Fig 1 .The selection of potential variables for our final model validation involved an exploratory phase in which we employed unadjusted and adjusted analyses, using groups of variables from three predictor categories (i.e.demographic characteristics, clinical characteristics, and lifestyle factors).The skewed nature of the primary outcome (i.e.overall MoCA score), led us in the first instance to select binary and quantile regressions, in line with previous literature [11,12].For these models, each baseline variable from Table 1 was studied, one at a time, to explore their predictive association with cognition at each time-point (i.e.baseline, 3 and 12-months).Two additional model formulations were used in this exploratory phase to account for the distribution of our outcome data and the repeated measurements: linear-quantile mixed-models (LQMM) [43] and Gamma-distributed mixed-linear regression models (henceforth called "mixed Gamma models") [44,45].These two modeling approaches take into account both the skewed nature of the overall MoCA scores, as well as the follow-up (i.e.longitudinal) measurements recorded for each participant.For each model formulation, we conducted additional exploratory, unadjusted, bivariate analyses, using overall MoCA scores (all 3 time-points at once) as the outcome (i.e.dependent) variable; and one predictor (i.e.independent) variable from Table 1, introduced one at a time.The same models were then re-run adjusting for baseline MoCA score to predict subsequent MoCA scores at 3 and 12-months post-stroke.

Identifying the 'best' combinations of predictive variables for cognition, based on individual-patient prediction
After finalizing our exploratory analyses, a set of candidate multivariate formulae were constructed, combining the baseline variables that showed a significant association with MoCA in the unadjusted (i.e.bivariate), and adjusted analyses (S1 Table ).These multivariate formulae were limited to a maximum of three predictors (i.e. one categorical, plus two continuous), adjusting for baseline MoCA scores each time.Pseudo-replication was controlled for by including each participant as a random intercept (i.e.random effect) in all models.Each model was fitted using the 'k-fold' cross-validation, also known as 'model training' method [46], which consists of fitting a model iteratively after removing one participant in each iteration.The resulting model is then tested on the held-out individual to predict their scores.This way, each individual has the opportunity to be the 'test' data.
After predicting scores for each individual person with all combinations of candidate baseline variables, we compared predicted and observed scores for each individual.Overall best predictive performance was arrived at by calculating a Pearson's correlation between predicted and observed scores for each individual in each model.

Power calculations
Power in the original study protocol was based on a model with 200 participants, and seven predictors at medium predictive capacity (R 2 = 0.5) [23].Given our smaller sample size, in our pre-analysis phase, we simulated a GLMM model, using a Gamma distribution for a maximum of five predictors, with 0.8 power, and alpha set at 0.05.Subsequently in our main analysis, we used a conservative approach and limited the number of predictors to a maximum of four variables (three baseline predictors, all adjusted by baseline MoCA).

Model external validation
Our modelling approach was externally validated on an independent mild stroke cohort from Singapore evaluated at admission (i.e.within the first 24 hours after the presentation of stroke symptoms), 3-and 12-months post-stroke [47].Variables that were common across the two cohorts were mapped and then used as baseline predictors.Models in the Singapore cohort were tested utilizing the same 'k-fold' validation and model selection methods described above.Comparisons between these two cohorts, and external validation results can be found in our (S1 and S4 Tables; and S1 and S2 Figs).

Software and predictive algorithms
All analyses and figures were completed using R v.3.5 [48].The LQMM [43] and LME4 [45] packages were used to fit mixed quantile, and Gamma models, respectively.Prediction of individual MoCA scores in LQMM was achieved by adapting the package's original source code to predict individual scores (LQMM's 'predict' function only predicts outcomes for the full sample and not for one individual, see our R-code for full details on this step).The full descriptive and predictive analysis plan and corresponding R-code for this study are available on https:// github.com/jpsaa/saa_cog_recovery_2023.

Cohort characteristics
One-hundred and forty-four participants met entry criteria for this study; twenty-five (17%) were lost to follow-up.Of those that were lost to follow-up (LTFU), 10 and 15 were not evaluated at 3-and 12-months, respectively, leaving a total of 119 individuals (83% of the enrolled sample) with complete data for analysis.Comparisons between included and excluded individuals (Table 1) did not reveal differences in any demographic or clinical characteristics at baseline.

Domain-specific cognitive performance over time
A median overall increase of 2 points, interquartile range (IQR = 3.5) was observed from baseline to 3-months (p<0.001,Table 2); and a median decline of 1 point (IQR = 3) was observed from 3 to 12-months (p = 0.001).Executive/visuospatial functions, attention, language, abstraction, and delayed recall improved significantly at 3-months (Table 2).A significant improvement in overall MoCA score, executive/visuospatial and delayed recall domains was observed from baseline to 12-months.

Cluster pathways of cognitive recovery
Nine cognitive trajectory clusters were identified based on overall MoCA changes of �1 point between assessments (Fig 3).The most common trajectory was given by cluster 3 (i.e.those who improved in overall MoCA from baseline to 3-months, but then declined from 3-to 12-months: "Improved-declined" cluster; 46/119 participants; 39%).Participants in this cluster scored a median (IQR) of 23.5 (6) points in the MoCA at baseline; 28 (3) points at 3-months; and 24.5 (6) at 12-months (Table 2), displaying the same significant variations in domain-specific scores as the full sample, except for executive function, which increased significantly between baseline from 4 (3) to 5 (1) points and 3-months (p<0.001),but then declined significantly from 5 (1) to 4 (2) points at 12-months (p<0.001), with an overall, non-significant change from baseline to 12-months (p = 0.715).
The second most common cluster were those who showed a consistent increase in MoCA scores between evaluations (cluster 1; 22/119 participants; 18%).These participants started with an average MoCA of 22 (3) points at baseline, and then improved to 25 (1.75), and 27.5 (2.75) points at 3-and 12-months, respectively.

Clinical profiles of trajectory clusters
Post-hoc comparative analyses of the two most common clusters (3 and 1) at each time-point (S3 Table ), revealed no significant differences in MoCA scores at baseline (Wilcoxon-Mann-Whitney Z-value = -1.51;p = 0.132); but significant differences in subsequent MoCA assessments at 3-months (Wilcoxon-Mann-Whitney Z-value = -2.89;p = 0.003), and 12-months (Wilcoxon-Mann-Whitney Z-value = 3.28; p = 0.001), as expected.Comparison of secondary evaluations (blood pressure, depression, disability, physical activity, work and social adjustment, and impact of stroke) revealed no significant differences at each time-point between these clusters, except for activity participation (ACS) at 3-months.No significant differences were found in clinical and complementary cognitive outcomes at 3-or 12-months between clusters 3 and 1.
Comparison of clusters with opposite trajectories (cluster 1 and 9), revealed significant differences in MoCA scores at baseline and 12-months; and in Stroop test and blood pressure at 3-months (S3 Table ).

Unadjusted bivariate models
All factors investigated that had a significant association with cognition can be found in S1 Table .At baseline, better cognition was significantly associated with higher educational level (high school or more).At 3-months, better cognition was associated with higher baseline MoCA, educational level, non-smoking status, higher physical activity level (RAPA strengthflexibility score), and ethnicity (Australia/NZ).At 12-months; better cognition was associated with higher baseline MoCA, education, non-smoking status, younger age, higher physical activity levels, and the absence of premorbid disability.

Adjusted models
Table 3 summarizes the associations between MoCA scores at 3-and 12-months and different baseline predictors, using one predictor at-a-time and adjusted by baseline MoCA.Potential baseline predictors investigated were all those listed in Table 1.The adjusted analyses presented in Table 3 revealed significant associations between MoCA at 3-and 12-months, and

Model cross-validation
Model cross-validation revealed an almost identical correlation between predicted and observed scores for both Gamma and Quantile regression mixed-effects models (S1 Fig) .The model with the best predictive accuracy in both model formulations included lesion side (left), younger age, and higher physical activity (r = 0.7 for mixed-quantile and 0.69 for Gamma regression).

Impact of tPA, anticoagulant or antiplatelet medication on cognition
Lastly, we compared those participants on tPA, anticoagulant or antiplatelet medication for any reason (n = 31 participants, which includes those who were in the EXTEND RCT and received tPA treatment i.e. n = 6), versus those who did not receive any of those medications (n = 88 participants) throughout the study.Our post-hoc analyses yielded no significant differences in overall cognition between the two groups of participants at baseline, 3-or 12-months; and no significant differences in baseline variables.Please refer to S6 Table for more details.

Discussion
Our aims were to characterize patterns of cognitive recovery over the first year post-stroke according to variations in MoCA scores; and to find the best combination of baseline variables predicting future cognitive recovery at 3-and 12-months post-stroke.

Changes in overall and domain-specific cognition at baseline, 3-and 12-months
We described changes in overall raw scores, in addition to the changes in proportion of survivors classified using a binary categorization of cognitive state (e.g.impaired/unimpaired), using a cutoff score of <24 points, as per evidence in Australia/New Zealand cohorts [25].When analyzing changes in raw MoCA scores, we found significant variations between all assessment time-points (i.e. from baseline to 3-months; from 3-to 12-months; and from baseline to 12-months).Our post-hoc analyses confirmed an initial improvement in overall MoCA scores explained by a significant increase in executive/visuospatial functions, attention, language, abstraction, and recall domains.The subsequent decrease was explained by poorer performance in attention and recall domains.Binary analyses yielded significant variations in cognitive state (impaired/unimpaired) only at 3-months, and between baseline and 12-months, but not from 3-to 12-months.This finding highlights the importance of using the full range of scores (0-30 points) as a more sensitive approach than binary analyses to capture changes in cognition between assessment timepoints.Further, the dichotomization of continuous variables is a well-known approach in clinical research that is not recommended as a robust approach, as it is highly likely to lead to the loss of statistical power [49,50].Outcome dichotomization, however, is still a common practice in stroke research [11,12].
Overall, these findings establish that a significant decline in cognition is possible within the first year post-stroke, even when preceded by a significantly improving pattern.The importance of detecting these changes aligns with current evidence supporting the use of MoCA early after stroke and across the continuum of stroke recovery [5,51,52].The MoCA is an informative, brief, easy-to-use instrument that is known among allied-health professionals, and has shown good predictive capacity informing future disability and mortality [5,8,53].

Trajectory clusters
Pathway cluster analyses revealed that the two most common cognitive trajectories were given by those survivors who improved and then declined ("improved-declined" cluster, 39%), followed by those who had a consistent improvement between assessments ("overall improvers" 18%).This finding highlights the relatively high proportion of patients-more than half our cohort-who are at risk of cognitive decline between 3-and 12-months post-stroke., as portrayed by clusters 2 ("improved-declined") and 6 ("stable-declined")(Fig 3  Furthermore, clinical and cognitive comparison between the two larger clusters ("improved-declined" and "overall improvers") revealed that the differences between these two clusters were almost exclusively found in cognitive performance at each time-point, but not in secondary clinical evaluations potentially associated with cognition (i.e.depression scores, disability, physical activity levels, work and social adjustment and stroke impact).
Again, these findings identify cognition as a key distinguishing feature, further highlighting the value of monitoring cognitive symptoms over time.

Model formulation-Quantile versus Gamma models
Our predictive approach using mixed Gamma vs quantile regression proved to be similarly valid and equally precise for both model formulations.Estimates of cognitive recovery were virtually identical between models, across all combinations of baseline factors we evaluated.Although our variables of interest have also been identified in previous studies [9,11,12,28,29,42,54], the cross-validation methodology used herein allowed us to go one step further to evaluate predictive performance for each combination of baseline variables on each stroke survivor individually.The end result was the ability to predict future scores at 3-and 12-months, using variables that are commonly evaluated at admission.A granular modelling approach that incorporates model cross-validation is an important addition to the present state of research in stroke that can help guide practitioners seeking more precise methods of prediction.This approach has the potential for future development as our algorithms have been made publicly available for researchers to modify, improve, and test.In addition, these tools are capable of modeling skewed, longitudinal outcomes, which are flexible to predict cognition, and any other continuous outcome yielding itemized or composite scores (e.g.disability, depression, pain).As new data and analysis tools become available, the use of mixed Gamma and quantile models should lead to more precise prediction algorithms with respect to popular binary and cross-sectional analyses.
Overall, our predictive findings align with previous evidence describing education, age, initial stroke severity, and physical inactivity as significant predictors of cognitive function [9,11,42].Side of lesion is a relatively newly studied factor [55], that points to better sensitivity of the MoCA to detect left, more than right, hemisphere deficits [56].

Benefit of using mixed Gamma over mixed quantile models for prediction of skewed, longitudinal outcomes
The advantage of Gamma models over LQMM is that they have been optimized to handle offset variables [45].An offset variable, also referred to as 'known parameter', has shown to improve the accuracy of predictions in mixed models [57].In our case, MoCA at baseline (i.e. the known parameter) could be used to scale the predictions at 3-and 12-months, instead of using this variable to adjust each model as a fixed parameter (i.e.predictor), ultimately leading to the use of less degrees of freedom, and thus less prediction error.

Implications for clinical practice
Overall, our predictive analyses revealed that the amount of physical activity, as assessed using the RAPA [26], was the only modifiable factor independently associated with cognition.This is a key finding of the present research that supports recent evidence underlining the positive impact of physical activity pre-and post-ischemic (and hemorrhagic) stroke on survival and overall functional recovery [58,59].Physical activity programs for stroke survivors have consistently been showing positive associations with cognitive recovery in systematic evidence and quality trials that have coupled these interventions with behavioral programs [29,60,61].Both the present study and the accumulated evidence suggest that both primary and secondary prevention programs involving physical activity, alongside a monitoring protocol for cognition may be important not just for monitoring cognition, but also overall recovery after stroke.Furthermore, our findings provide additional detail about the positive role of physical activity on cognitive recovery, and possibly overall post-stroke functional recovery and survival.
Future studies should continue to thoroughly examine the role of physical activity in cognitive recovery and prioritize similar interventions, including a direct comparison of different types of physical activity, at different times of the subacute phase, focusing on defined windows of recovery (i.e.recovery epochs), and for varying intensities, both combined, and in isolation.The interaction (i.e.multiplicative effect) with other behavioral approaches such as self-management programs should also be considered using suitable methods of analysis that are already available from the present study and from previous systematic evidence analyzing the multiplicative effect of different interventions on cognition [13].

Limitations
Although Gamma models have been validated widely in areas of epidemiology and biology [62], LQMM models need further development and validation for the purposes of individual prediction.As detailed in our statistical analysis, we modified LQMM's 'predict' function to calculate scores at 3-and 12-months for a single individual.Though our modifications proved to be effective with the available data from the START cohort in Australia/NZ, and with our external validation using data from Singapore, further validation of the modified function is required, with additional datasets using MoCA and other continuous outcomes with a skewed distribution.
The limited variability in explanatory variables may also have resulted in them showing a reduced predictive value, increasing the risk of spurious variable selection.Our findings may also be masked by the clear ceiling effects of certain subtests of the MoCA such as naming (recognition of known objects) and basic orientation (time/space) [53].
Although our study did not include a comprehensive examination of vascular vulnerabilities and hemorrhagic transformation on cognition, we acknowledge the importance of monitoring these variables and including them in future studies that set out to understand the many facets of post-stroke cognitive trajectory [63,64].Our findings are only valid for those participants with mild stroke severity over the first year of recovery.Monitoring cognition is crucial given the current state of evidence of mild-stroke and cognitive decline in the first year post-stroke [65].

Conclusion
Cognitive performance in mild stroke survivors improved significantly from baseline to 3-months, but then decreased significantly from 3-to 12-months post-stroke.The description of clusters of cognitive change in the first year post-stroke has provided new insights on cognitive recovery, highlighting the importance of monitoring changes in cognition over time.MoCA scores during the first year post-stroke were consistently associated with non-modifiable factors such as age and lesion side, and with only one modifiable factors: physical activity.Future predictive studies should focus on predicting individual-level cognitive function and on finding other variables associated with cognitive recovery that are modifiable during the first year post-stroke and beyond.

Table 3 . Mixed quantile and Gamma regression models predicting Montreal Cognitive Assessment (MoCA) scores at 3-and 12-months using baseline predictors.
records of previous transient ischemic attack (TIA), hypertension, left-hemisphere lesion side, age, and physical activity (RAPA strength score).